Skip to content

Add CMakePresets for target micro arch#1348

Open
AntoinePrv wants to merge 12 commits into
xtensor-stack:masterfrom
AntoinePrv:cmake-presets
Open

Add CMakePresets for target micro arch#1348
AntoinePrv wants to merge 12 commits into
xtensor-stack:masterfrom
AntoinePrv:cmake-presets

Conversation

@AntoinePrv
Copy link
Copy Markdown
Contributor

@AntoinePrv AntoinePrv commented May 13, 2026

I've taken the direction of explicit flags such as -mavx -mno-avx2.
This is IMHO less error prone and more accurate that using architecture name such as haswell.
The main difference is that this does not add other feature flags or change the -mtune model.
For a test setting accuracy is more important IMHO.

Comment thread .github/workflows/linux.yml Outdated
@serge-sans-paille
Copy link
Copy Markdown
Contributor

I really like your approach and will eagerly merge it once it validates \o/

@AntoinePrv
Copy link
Copy Markdown
Contributor Author

I've only kept the micro architecture target in CMakePresets.txt because combining with (debug/release) / (xtl on/off)... results in a combinatorial explosion of presets for which there is currently no support.
Another shortcoming is that we cannot dispatch here based on compiler for MSVC flags. We can do it based on OS but it is not quite the same.

I have ongoing work to actually do the same as these presets at the CMake level, with a function that can be made available to users to help in the tooling for dynamic dispatch (our current solution in Arrow is very verbose).
In this case, we'd need to also define a safe -march baseline. The reason is the code in these translation units might also include non SIMD code (this is sometimes the case in Arrow). In this case, with very advanced instruction sets, we're leaving perf on the table by having a x86-64 baseline. But what should be a reasonable baseline for dynamic dispatching to for example avx2?

  • haswell (first avx2) also has fma3 and bmi2
  • -march=haswell -mno-fma3 -mno-bmi2 if that is a thing?
  • Or go further back? sandybridge (first avx)? nehalem (first sse4.2)

@AntoinePrv AntoinePrv force-pushed the cmake-presets branch 4 times, most recently from 0d3d6ea to c573d9d Compare May 20, 2026 07:17
@AntoinePrv
Copy link
Copy Markdown
Contributor Author

@serge-sans-paille this is in a ready state, but I am not fully happy with it.

Getting into AVX512, and AVX512-256, the combinatorial explosion of possibilities start to show again.
Inheritance of flags from other settings is also not possible.

This reinforce my belief that I should keep on with the work to do it in CMake (that could also be installed for our users to improve our dynamic dispatch tooling), and also homogenized with the test TARGET_ARCH var.

This PR is not completely worthless though. For example we now have the possibility to really test with avx512f, which was not the case before because no Intel arch is limited to the f feature only.

What do you think? Should we give this some mileage before I get the time to work on a CMake solution?

Comment thread .github/workflows/linux.yml
fi
if [[ '${{ matrix.sys.flags }}' == 'i386' ]]; then
CXX_FLAGS="$CXX_FLAGS -m32"
export CXXFLAGS="$CXXFLAGS -m32"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!!!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a weird mismatch in master. Both CXX_FLAGS and CXXFLAGS where set but only CXX_FLAGS was explicitly passed to CMake. CXXFLAGS is picked up automatically but it was not exported.

Comment thread .github/workflows/linux.yml
Comment thread CMakePresets.json
Comment thread CMakePresets.json
{
"name": "avx2",
"cacheVariables": {
"CMAKE_CXX_FLAGS": "$env{CXXFLAGS} -march=x86-64-v2 -mno-sse4a -mavx2 -mno-avx512f"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we sometime have fallback from avx2 instructions to sse instructions. How can this work??

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand the need to prune higher instruction sets, but not the need to prune lower ones, please explain.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean -mno-sse4a ? This can be removed, I added when trying to debug some -march=native that was added by the absence of TARGET_ARCH.

Though it is not a problem here: sse4a is an AMD extension that was never implemented on Intel (and that is why it was failing in SDE).

Copy link
Copy Markdown
Contributor

@serge-sans-paille serge-sans-paille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good, except the question on pruning lower architectures which raises a big unknown to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants